Adds `PropagationNet` GNN layer and makes it optionally usable in existing deterministic models by Sir-Sloth-The-Lazy · Pull Request #507 · mllam/neural-lam

Sir-Sloth-The-Lazy · 2026-03-24T07:02:59Z

Describe your changes

Adds PropagationNet GNN layer and makes it optionally usable in existing deterministic models, as outlined in #62.

It is integrated into the existing model hierarchy from #208 and can be enabled via the vertical_propnets flag.

Depends on #208.
For changes on top of #208 only, see:
Sir-Sloth-The-Lazy/neural-lam@refactor/model-class-hierarchy-issue-49...refactor/batch-fold-ensemble-prep

Issue Link

Contributes to #62

Type of change

🐛 Bug fix (non-breaking change that fixes an issue)
✨ New feature (non-breaking change that adds functionality)
💥 Breaking change (fix or feature that would cause existing functionality to not work as expected)
📖 Documentation (Addition or improvements to documentation)

Checklist before requesting a review

My branch is up-to-date with the target branch - if not update your fork with the changes from the target branch (use pull with --rebase option if possible).
I have performed a self-review of my code
For any new/modified functions/classes I have added docstrings that clearly describe its purpose, expected inputs and returned values
I have placed in-line comments to clarify the intent of any hard-to-understand passages of my code
I have updated the README to cover introduced code changes
I have added tests that prove my fix is effective or that my feature works
I have given the PR a name that clearly describes the change, written in imperative form (context).
I have requested a reviewer and an assignee (assignee is responsible for merging). This applies only if you have write access to the repo, otherwise feel free to tag a maintainer to add a reviewer and assignee.

Checklist for reviewers

Each PR comes with its own improvements and flaws. The reviewer should check the following:

the code is readable
the code is well tested
the code is documented (including return types and parameters)
the code is easy to maintain

Author checklist after completed review

I have added a line to the CHANGELOG describing this change, in a section
reflecting type of change (add section where missing):
- added: when you have added new functionality
- changed: when default behaviour of the code has been changed
- fixes: when your contribution fixes a bug
- maintenance: when your contribution is relates to repo maintenance, e.g. CI/CD or documentation

Checklist for assignee

PR is up to date with the base branch
the tests pass
(if the PR is not just maintenance/bugfix) the PR is assigned to the next milestone. If it is not, propose it for a future milestone.
author has added an entry to the changelog (and designated the change as added, changed, fixed or maintenance)
Once the PR is ready to be merged, squash commits and merge the PR.

- Update test_datasets.py to use ForecasterModule instead of GraphLAM - Update test_plotting.py to use ForecasterModule instead of GraphLAM - Fix interior_mask_bool property shape (1,) -> (N,) for correct loss masking - Fix all_gather_cat to handle single-device runs without incorrect dim collapse

…r hierarchy - Replace opaque argparse.Namespace with explicit keyword arguments in StepPredictor, BaseGraphModel, BaseHiGraphModel, GraphLAM, HiLAM, and HiLAMParallel __init__ methods - Reorder methods in step_predictor.py: forward/expand_to_batch now appear before clamping methods - Update all instantiation sites (train_model.py, test_training.py, test_prediction_model_classes.py) to pass explicit kwargs - HiLAM helper methods (make_same/up/down_gnns) now use self.hidden_dim and self.hidden_layers instead of args parameter Addresses review comments on PR mllam#208.

- Rename border to boundary in Forecaster - Pass Forecaster object to ForecasterModule init instead of Predictor - Remove inline imports in ForecasterModule - Move loss-related pred_std logic fully into ForecasterModule - Delete obsolete test_refactored_hierarchy.py

…phModel.__init__

Co-authored-by: Joel Oskarsson <joel.oskarsson@outlook.com>

- Add predicts_std property to StepPredictor, Forecaster and ARForecaster so ForecasterModule can query the forecaster instead of taking output_std as a separate constructor argument - Remove output_std parameter from ForecasterModule; use self._forecaster.predicts_std throughout - Move fallback per_var_std logic out of forecast_for_batch into each step method so pred_std is None before fallback, enabling direct None checks instead of hparam checks - Replace len(datastore.boundary_mask) with datastore.num_grid_points in StepPredictor to avoid relying on boundary_mask - Move get_state_feature_weighting and ARForecaster inline imports to module-level imports in forecaster_module.py and train_model.py - Fix statement ordering in StepPredictor.__init__ so register_buffer for grid_static_features appears directly after building the tensor - Replace dict+loop pattern for registering state_mean/state_std buffers with two direct register_buffer calls - Remove all internal Item N checklist references from comments - Remove TORCH_FORCE_NO_WEIGHTS_ONLY_LOAD env var hack; pass weights_only=False explicitly to load_from_checkpoint calls and weights_only=True to torch.load in test_graph_creation.py - Add test_step_predictor_no_static_features to verify models initialise and run correctly when the datastore returns None for static features - Fix graph= -> graph_name= and model.forecaster -> model._forecaster in tests to match current API

…r_batch Makes the forecasting path tolerant to batch-folded execution so that future ensemble generation can fold (S, B) into (S*B) before calling ARForecaster, without any changes to ARForecaster or StepPredictor. Prediction is kept folded through the existing deterministic logging and aggregation paths so all dim assumptions in training_step, validation_step, and test_step remain correct. Unfolding to (*leading, T, N, F) is deferred to ensemble-specific subclasses (e.g. EnsForecasterModule). Adds test_fold_unfold_equivalence to confirm ARForecaster's rollout is rank-transparent under a pre-entry fold.

…or_batch

…stic models - Port PropagationNet as InteractionNet subclass (mean aggr, sender residual in messages, aggregation residual in forward) - Add --vertical_propnets CLI flag to select PropagationNet for grid-mesh and vertical message passing edges - Wire flag through model hierarchy: BaseGraphModel (g2m/m2g), BaseHiGraphModel (mesh init), HiLAM (up GNNs) - Add 13 tests covering unit behavior and backward compatibility

Sir-Sloth-The-Lazy · 2026-03-24T07:05:51Z

@joeloskarsson @sadamov @observingClouds please have a look , if this qualifies as the next step in ensemble prep 😄 . Would be grateful for your feedback !

Debadri-das · 2026-03-24T19:16:45Z

@joeloskarsson @sadamov would request your further review on this PR!

joeloskarsson

Thanks for looking at this, and sorry for being slow with reviews 😅 Shared some first thoughts here on how I think we can best integrate this. Happy to hear input also from others, as there are some non-trivial design choices around this (e.g. how to choose the GNN type for each sub-graph).

joeloskarsson · 2026-03-30T11:42:49Z

neural_lam/models/base_graph_model.py

+        num_past_forcing_steps: int = 1,
+        num_future_forcing_steps: int = 1,
+        output_std: bool = False,
+        vertical_propnets: bool = False,


I think we need more fine-grained options for where to use propnets here (g2m, m2g, both). We should also make this more future-proof by not making it a boolean inet/propnet, but rather having the argument be the GNN type to use (as a string? enum? not sure about best design).

This goes also for other model classes with this argument.

joeloskarsson · 2026-03-30T11:47:36Z

neural_lam/interaction_net.py

+        # Always concatenate to [rec_nodes, send_nodes] for propagation,
+        # but only aggregate to rec_nodes
+        node_reps = torch.cat((rec_rep, send_rep), dim=-2)
+        edge_rep_aggr, edge_diff = self.propagate(
+            self.edge_index, x=node_reps, edge_attr=edge_rep
+        )
+        rec_diff = self.aggr_mlp(
+            torch.cat((rec_rep, edge_rep_aggr), dim=-1)
+        )
+
+        # Residual connections
+        rec_rep = edge_rep_aggr + rec_diff  # residual is to aggregation
+
+        if self.update_edges:
+            edge_rep = edge_rep + edge_diff
+            return rec_rep, edge_rep
+
+        return rec_rep


There is quite a lot of repeated code from the InteractionNets forward here. Could this be refactored to avoid as much repetition, while keeping some clarity of the differences between the two classes?

Sir-Sloth-The-Lazy and others added 22 commits February 21, 2026 17:42

refactor: add Forecaster and StepPredictor abstract base classes

e7eb060

refactor: add ARForecaster implementing auto-regressive unrolling

4aa3112

refactor: add ForecasterModule as pl.LightningModule wrapper

ea5f185

refactor: change BaseGraphModel base class to StepPredictor

6892997

refactor: update train_model.py and __init__.py for new hierarchy

0bd2f5e

test: add tests for refactored model hierarchy and checkpoint loading

1781aba

refactor: archive legacy monolithic ar_model.py

3e1011e

Removed ar_model.py

5305a65

Merge branch 'main' into refactor/model-class-hierarchy-issue-49

b013643

StepPredictor now allows static features

211a2ce

ForecasterModule now queries state_mean and state_std

be58b66

StepPredictor and moved the datastore retrieval directly into BaseGra…

d02ca1d

…phModel.__init__

Broad namespace remap and added regression test

00083c5

Update neural_lam/models/forecaster_module.py

c81958c

Co-authored-by: Joel Oskarsson <joel.oskarsson@outlook.com>

fix: return folded target_in to match folded prediction in forecast_f…

f78b3c4

…or_batch

integration test for the propagation_net for deterministic models

b29a862

joeloskarsson requested changes Mar 30, 2026

View reviewed changes

joeloskarsson self-assigned this Mar 30, 2026

sadamov mentioned this pull request Mar 31, 2026

Register feature_weights as buffer to ensure correct device placement #241

Closed

21 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds `PropagationNet` GNN layer and makes it optionally usable in existing deterministic models#507

Adds `PropagationNet` GNN layer and makes it optionally usable in existing deterministic models#507
Sir-Sloth-The-Lazy wants to merge 22 commits intomllam:mainfrom
Sir-Sloth-The-Lazy:refactor/batch-fold-ensemble-prep

Sir-Sloth-The-Lazy commented Mar 24, 2026 •

edited

Loading

Uh oh!

Sir-Sloth-The-Lazy commented Mar 24, 2026 •

edited

Loading

Uh oh!

Debadri-das commented Mar 24, 2026

Uh oh!

joeloskarsson left a comment

Uh oh!

joeloskarsson Mar 30, 2026

Uh oh!

joeloskarsson Mar 30, 2026

Uh oh!

joeloskarsson Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Sir-Sloth-The-Lazy commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Describe your changes

Issue Link

Type of change

Checklist before requesting a review

Checklist for reviewers

Author checklist after completed review

Checklist for assignee

Uh oh!

Sir-Sloth-The-Lazy commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Debadri-das commented Mar 24, 2026

Uh oh!

joeloskarsson left a comment

Choose a reason for hiding this comment

Uh oh!

joeloskarsson Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

joeloskarsson Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

joeloskarsson Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Sir-Sloth-The-Lazy commented Mar 24, 2026 •

edited

Loading

Sir-Sloth-The-Lazy commented Mar 24, 2026 •

edited

Loading